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(57) Abstract: Disclosed is a mclhcKi (9(K)> lor communicaiing ai least pail of a siruciurc of a document (1(W) described by a 
hierarchical representation (102). The method identifies (902 ) the hierarchical represenlalion (eg. the tree structure) of the document 
( 104). The identification is preferably pcrjormed using XML tags. The representation is then packetizcd (90(i) into a plurality ofdata 
packets. At least one link is then created (908) between a pair of the packets, the link acting to represent an interconnection between 
corresponding components (eg. structure and content) of the representation. The p;ickcis arc then formed (910) into a stream for 
communication. The links maintain the hicrachical represenlalion within the packets. 
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XML ENCODING SCHEME 
Field of the iDvention 
The present invention relates to the encoding of XML (Extensible Markup 
Language) documents and, in particular, to at least one of the compression, streaming, 
5 searching and dynamic construction of XML documents. 

Background 

To make streaming, downloading and storing MPEG-7 descriptions more efficient, 
the description can be encoded and compressed. An analysis of a number of issues 
relating to die delivery of MPEG-7 descriptions has involved considering the format to be 

10 used for binary encoding. Existing encoding schemes for XML, including the WBXML 
proposal from WAP (the Wireless Application Protocol Forum), the Millau algoritlm:i and 
IhtXMill algoritlim, have each been considered. 

With WBXML, frequently used XML tags, attributes and values are assigned a 
fixed set of codes from a global code space. Application specific tag names, attribute 

15 names and some attribute values that are repeated throughout document instances are 
assigned codes from some local code spaces. WBXML preserves the structure of XML 
documents. The content as well as attribute values that are not defined in the Document 
Type Definition (DTD) can be stored in-line or in a string table. It is expected that tables 
of the document's code spaces are known to the particular class of applications or are 

20 transmitted witli the document. 

Wliile WBXML tokenizes tags and attributes, there is no compression the textual 
content. AVhilst such is probably sufficient for the Wireless Markup Language (WML) 
documents, proposed for use imder the WAP, and for which WBXML is designed, as such 
documents usually have limited textual content, WBXML is not considered to be a very 
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efficient encoding fonnat for the typical text-laden XML documents. The Millau approach 
extends the WBXML encoding format by compressing text using a traditional text 
compression algorithm, Millau also takes advantage of the schema and datatypes to 
enable better compression of attribute values that are of primitive datatypes. 
5 The authors of the Xmill algorithm have presented an even more complex encoding 

scheme, although such was not based on WBXML. Apart from separating structure and 
text encoding and using type information in DTD and schema for encoding values of built- 
in datatypes, that scheme also: 

(i) grouped elements of the same or related types into containers (to increase 
10 redundancy), 

(ii) compressed each contamer separately using a different compressor, 

(iii) allowed atomic compressors to be combined into more complex ones, and 

(iv) allowed the use of new specialized compressors for liighly specialized 
datatypes. 

15 Nevertheless, existing encoding schemes are only designed for compression. They 

do not support the streaming of XML documents, hi addition, elements still cannot be 
located efjBciently using the XPath/XPointer addressing scheme and a document cannot be 
encoded incrementally as it is being constructed. 

Summary of the Invention 
20 In accordance with one aspect of the present disclosure, there is provided a method 

of communicating at least part of a structure of a document described by a hierarchical 
representation, said method comprising the steps of: 

identifying said representation of said document; 
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packetizing said representation into a plurality of data packets, said packets having 
a predetennined size, said packetizing comprising creating at least one link between a pair 
of said packets, said link representing an interconnection between corresponding 
components of said representation; and 
5 forming said data packets into a stream for communication wherein said links 

maintam said representation within said packets. 

hi accordance with another aspect of the present disclosure, there is provided a 
method of communicating at least part of the structure of a document described by a 
hierarchical representation, said method comprising the steps of: 

10 identifying at least one part of said representation and packetizing said parts into at 

least one packet of predetemiined size, characterised in that where any one or more of said 
parts of said representation do not fit within one said packet, defining at least one link 
from said one packet to at least one further said packet into which said non-fittmg parts 
are packetized, said link maintaining the hierarchical structure of said document in said 

15 packets. 

In accordance with anotlier aspect of the present disclosure, tliere is provided a 
method of facilitating access to the structure of an XML document, said method 
comprising the steps of: 

identifying a hierarchical representation of said document; 
20 packetizing said representation into a plurality of packets of predetennined packet 

size; 

fomiing links between said packets to define those parts of said representation not 
able to be expressed within a packet thereby enabling reconstruction of said 
representations after de-packetizing. 
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The presently disclosed encoding and decoding schemes separate structure and text 
encoding and use the schema and datatypes for encoding values of built-in datatypes, hi 
addition, the disclosure provides support for streaming and allows efficient searching 
using XPath/XPointer-like addressing mechanism. Such also allows an XML dociiment to 
5 be encoded and streamed as it is being constructed. These features are important for 
broadcasting and mobile applications. The presently disclosed encoding scheme also 
supports multiple namespaces and provides EBNF definitions of the bitstream and a set of 
interfaces for building an extensible encoder. 

Brief Description of the Drawings 
10 One or more embodiments of the present invention will now be described with 

reference to the drawings and Appendix, in which: 

Fig. 1 schematically depicts an encoded XML document; 
Fig. 2 depicts the organization of the structure segment; 
Fig. 3 schematically depicts the encoder model; 
15 Fig. 4 schematically depicts the decoder model.; 

Fig. 5 schematically illustrates the encoder encodmg an XML document 
incrementally into multiple packets; 

Figs. 6A and 6B show how node locators are used for linking a node to its sub- 
trees in other structure packets and how each node locator contains the packet number of a 
20 sub-tree's packet; 

Fig. 7 schematically depicts how a long string is stored as string fragments in 
multiple text packets which each packet pointing to the text packet that contains the next 
fragments; 
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Fig. 8 is a schematic block diagram representation of a computer system with 
whicli tlie described arrangements may be implemented; 

Fig. 9 is flowchart of a XML document encoding operation; 

Fig. 10 is flowchart illustrating how different data types can be handled in the 
5 encoding operations; and the 

Appendix provides a definition usefiil for the encoded bitstream and the parameters 
thereof 

Detailed Description 

Tlie methods of encoding and decoding XML documents to be described with 
10 reference to Figs. 1 to 7 and 9 and 10 are preferably practiced using a general-purpose 
computer system 800, such as that shown in Fig, 8 wherein the processes of Figs. 1 to 7 
may be implemented as software, such as an application program executing within the 
computer system 800. hi particular, the steps of the methods may be effected by 
instructions in the software that are carried out by the computer. The software may be 
15 divided into two separate parts; one part for carrying out the encoding/decoding methods; 
and another part to manage the user interface between the encoding/decoding methods and 
the user. The software may be stored in a computer readable medium, including the 
storage devices described below, for example. The software is loaded into the computer 
from the computer readable medium, and then executed by the computer. A computer 
20 readable medium having such software or computer program recorded on it is a computer 
program product. The use of the computer program product in the computer preferably 
effects an advantageous apparatus for encoding/decoding XML documents. 

The computer system 800 comprises a computer module 801, input devices such 
as a keyboard 802 and mouse 8.03, output devices including a printer 815 and a display 
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device 814. A Modulator-Demodulator (Modem) transceiver device 816 is used by the 
computer module 801 for communicating to and fi om a communications network 820, for 
example connectable via a telephone line 821 or other functional medium. The 
modem 816 can be used to obtain access to the Internet, and other network systems, such 
5 as a Local Area Network (LAN) or a Wide Area Netwoik (WAN). A seen, a server 
computer system 850 connects to tlie network 820 enabling communications with the 
computer system 800. The server computer 850 typically has a similar structure and/or is 
operable in a like or complementary fashion to the computer system 800. For example, 
whilst the computer system 800 may perfonn an XML encoding function, the server 

1 0 computer 850 may perform a complementary XML decoding function, and vice versa. 

The computer module 801 typically includes at least one processor unit 805, a 
memory imit806, for example formed from semiconductor random access memory 
(RAM) and read only memory (ROM), input/output (I/O) interfaces including a video 
interface 807, and an I/O interface 813 for the keyboard 802 and mouse 803 and optionally 

15 a joystick (not illustrated), and an interface 808 for tlie modem 816. A storage device 809 
is provided and typically includes a hard disk drive 810 and a floppy disk drive 811. A 
magnetic tape drive (not illustrated) may also be used. A CD-ROM drive 812 is typically 
provided as a non- volatile source of data. The components 805 to 813 of the computer 
module 801, typically conmiunicate via an interconnected bus 804 and in a mamier which 

20 results in a conventional mode of operation of the computer system 800 laiown to those in 
the relevant art. Examples of computers on which the described arrangements can be 
practised include IBM-PC's and compatibles. Sun Sparcstations or alike computer systems 
evolved therefrom. 
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Typically, tlie application program is resident on tlie hard disk drive 810 and read 
and controlled in its execution by the processor 805. Intennediate storage of the program 
and any data fetched from the network 820 may be accomplished using the semiconductor 
memory 806, possibly in concert with the hard disk drive 810. In some instances, the 
5 application program may be supplied to the user encoded on a CD-ROM or floppy disk 
and read via the corresponding drive 812 or 811, or alternatively may be read by the user 
from the network 820 via the modem device 816. Still fiiither, the software can also be 
loaded into the computer system 800 from other computer readable medium includmg 
magnetic tape, a ROM or integrated circuit, a magneto-optical disk, a radio or infra-red 
10 transmission channel between the computer module 801 and another device, a computer 
readable card such as a PCMCIA card, and the Internet and Intranets including e-mail 
transmissions and information recorded on Websites and the like. The foregoing is merely 
exemplary of relevant computer readable mediums. Other computer readable media may 
alternately be used. 

15 In operation the XML document encoding/decoding fimctions are performed on 

one of tlie server computer 850 or the computer system 800, and the packetized bit stream 
so fomied transmitted over the communications network 820 for reception and decoding 
by tlie computer system 800 or server computer 850 respectively, as the case may be. In 
this fashion an XML document may be conveniently communicated between two locations 

20 in an efficient manner whilst affording optimal time at the receiver to decode the 
document on-the-fly as it is received without a need to first receive the entire document. 

The methods of encoding and decoding may alternatively be implemented in part 
or in whole by dedicated hardware such as one or more integrated circuits performing the 
ftmctions or sub fimctions of encoding and/or decoding. Such dedicated hardware may 
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include graphic processors, digital signal processors, or one or more microprocessors and 
associated memories. 
Encoding and Compressing XML 
Separating structure and text 
5 Traditionally, XML documents are mostly stored and transmitted in their raw 

textual format. In some applications, XML documents are compressed using some 
traditional text compression algorithms for storage or transmission, and decompressed 
back into XML before they are parsed and processed. 

According to the present disclosure, another way for encoding an XML document 

10 is to encode the tree hierarchy of the document (such as the DOM representation of the 
document). The encoding may be perfonned in a breadth-first or depth-first manner. To 
make tlie compression and decoding more efiBcient, tlie XML structure, denoted by tags 
within the XML document, can be separated from the text of the XML document and 
encoded. Wlien transmitting the encoded document, the structure and the text can be sent 

15 in separate streams or concatenated into a single stream. 

As seen in Fig. 1 and according to the instant embodiment, a tree 
representation 102 of an XML document 104, which is typically available from memory, 
includes a number of nodes 116 and is encoded in a depth-first fashion. The structure of 
the document 104 and the text contained therein can be encoded as two separate streams 

20 106 and 108 respectively as shown, or concatenated into a single stream. The structure 
stream 106 is headed by the code tables 110 and 114. The encoded nodes 118 of the 
tree 102 each have a size field (not illustiated) that indicates the size of the node and 
includes tlie total size of its descendant nodes. Some of the encoded leaf nodes 118 
contain linlcs 112 that link those leaf nodes to their corresponding encoded content in the 
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text stream lOS. Each encoded string in the text stream 108 is headed by a size field (not 
illustrated) that indicates the size of the string. Where concatenated into a single stream, 
packets containing the root of the linlcs 112 should precede those packets containing the 
text pointed to by the links 1 12, thereby ensuring that the structure component of the 
5 document 104 is received by the decoder before the corresponding text (content) 
component. 

The approach shown in Fig. 1 is also depicted in Fig. 9 as a flowchart of an 
encoding method 900 which may be implemented as a software program running on the 
computer system 800. The method 900 communicates at least part of a structure of a 

10 document described by a hierarchical representation and has an entry step 902. Initially, at 
step 904, the method 900 identifies the hierarchical representation (eg. the tree structure) 
of the document 104. The identification is preferably perfomied using the XML tags as 
mentioned above. With tliis, at step 906 the representation is packetized into a plurality of 
data packets. At step 908, at least one link is created between a pair of the packets. The 

15 link acting to represent an interconnection between corresponding components (eg. 
stnicture and content) of the representation. In step 910, the packets are formed into a 
stream for communication. The links maintain the hierarchical representation within the 
packets. The method 900 ends at step 912. 

In general, the volume of structural information is much smaller than that of 

20 textual content. Structures are usually nested and repeated witliin a document instance. 
Separating structure from text allows any repeating patterns to be more readily identified 
by the compression algoritlim wliich, typically, examines the input stream tlirough a fixed- 
size wmdow. In addition, the structure and the text streams have rather different 
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characteristics. Hence, different and more efficient encoding methods may be appHed to 
each of the structure and text. 

The structure is critical in providing the context for interpreting the text. 
Separating structure and text in an encoder allows the corresponding decoder to parse tlie 
5 structure of the document more quicldy thereby processing only the relevant elements 
while ignoring elements (and descendants) that it does not know or require. The decoder 
may even choose not to buffer the text associated with any irrelevant elements. Whether 
the decoder converts the encoded document back into XML or not depends on the 
particular application to be performed (see the discussion below on Application Program 
10 Interfaces -API's). 
Code tables 

The elements of a document description and their attributes are defmed in DTD's 
or schemas. Typically, a set of elements and their associated attributes are repeatedly used 
in a document instance. Element names as well as attribute names and values can be 

15 assigned codes to reduce the number of bytes required to encode them. 

Typically, each application domain uses a different set of elements and types 
defined in a number of schemas and/or DTD's. In addition, each schema or DTD may 
contain defmitions for a different namespace. Even if some of the elements and types are 
common to multiple classes of applications, they are usually used in a different pattem. 

20 That is, an element X, common to both domains A and B, may be used frequently in 
domain A, but rarely in domain B. In addition, existing schemas are updated and new 
schemas are created all the time. Hence, it is best to leave the code assigmnent to 
organisations that overlook interoperability in their domains. For instance, MPEG-7 
descriptions are XML documents. MPEG may defme the codespaces for its own 
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descriptors and description schemes as well as exteraal elements and types that are used by 
them. MPEG may also define a method for generating codespaces. Ideally, the method 
should be enti'opy based - that is, based on the number of occurrences of the descriptors 
and description schemes in a description or a class of description (see the section on 
5 generating codespaces). 

Separating element and attributes 

An XML tag typically comprises an element name and a set of attribute 
name/value pairs. Potentially, a large set of attributes can be specified with an element 
instance. Hence, separating an element name j&om the attributes will allow the docmnent 
10 tree to be parsed and elements to be located more quickly. In addition, some attributes or 
attribute name/value pairs tend to be used much more frequently than the others. 
Grouping attribute name, value and name/value pairs mto different sections usually results 
in better compression. 

Encoding values of built-in datatypes and special types 

15 The encoder operates to encode the values of attributes and elements of built-in (or 

default) datatypes into more efficient representations according to their types. If the 
schema that contains the type information is not available, the values are treated as strings. 
In addition, if a value (for instance, a single-digit integer) is more efficiently represented as 
a string, the encoder may also choose to treat it as string and not to encode it. By default, 

20 strings are encoded as a Universal Text Format (UTF-8) string which provides a standard 
and efficient way of encoding a string of multi-byte characters. In addition, the UTF string 
includes length information avoidmg the problem of finding a suitable delimiter and 
allowing one to skip to the end of the string easily. 
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Special type encoders can be used for special data types. These special type 
encoders can be specified using the setTypeEncoder() interface of tlie Encoder API (as 
discussed below). Information about the special type encoders is preferably stored in the 
header of the structure segment, advantageously as a table of type encoder identifiers. 
5 Further, the default type encoders (for the built-in datatypes) can be overridden using the 
same mechanism. As such where some built-in data type would ordinarily be encoded 
using a default encoder, a special encoder may alternatively be used, such necessitating 
identification witliin the bitstream that an alternative decoding process will be required for 
correct reproduction of the XML document. Each encoded value is preceded by the 

10 identifier of the type encoder that was used to encode the value. 

hi this fashion, an XML document encoder implemented according to the present 
disclosure may include a number of encoding formats for different types of structure and 
text within the XML document. Certain encodmg formats may be built-in or default and 
used for well known or commonly encountered data types. Special type encoders may be 

15 used for any special data types. In such cases, an identification of the particular type 
encoder(s) used in the encoding process may be incorporated into the header of a packet, 
thereby enabling the decoder to identify those decoding processes required to be used for 
the encoded types in the encoded document. Where appropriate, the particular type 
encoders may be accessible fi-om a computer network via a Unifonn Resource Indicator 

20 (URI). Where the decoder is unable to access or implement a decoding process 
corresponding to an encoded type encoxmtered withui a packet in the encoded document, a 
default response may be to ignore that encoded data, possibly resulting in the reproduction 
of null data (eg. a blank display). An alternative is where the decoder can operate to fetch 
the special type decoder, firom a connected network, for example using a URI that may 
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accompany the encoded data. The URI of an encoder/decoder format may be incorporated 
into the table mentioned above and thereby included in the bitstream (see the Appendix). 

In a further extension of this approach, multiple encoding formats may be used for 
to a single data type. For example, text strings may be encoded differently based upon the 
5 length of the string, such representing a compromise between the time taken to perform a 
decoding process and the level of compression that may be obtained. For example, text 
strings with 0-9 characters may not be encoded, whereas strings with 10-99 and 100-999 
characters may be encoded with respective (different) encoding formats. Further, one or 
more of those encoding foimats may be for a special data type. As such the encoder when 

10 encoding text strings in this example may in practice use no encoding for 0-9 character 
strings, a default encoder for 10-99 character strings, and a special encoder for string 
having more than 100 text characters. 

Fig. 10 shows an example of a method 1000 of encoding an XML document, that 
has an entry point of step 1102. Initially, at step 1004, the method 1000 examines the 

15 XML document 104to identify each data type forming part of the XML document 104. At 
step 1006, the method 1000 operates to identify a first set of the data types for which a 
corresponding special encoding format is available. Having identified the special data 
types, step 1008 encodes each part of tlie XML document having a data type in the first set 
with the corresponding special encoding format. Next, in step 1010, the method 1000 

20 encodes each remaining part of the XML document with a default encoding format 
corresponding to the data type of the remaming part. In step 1012, a representation is 
fomied of the infomiation referencing at least each of the data types in the first set with the 
corresponding special encoding foraiat. In step 1014, the representation is associated with 

.i 
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the encoded parts as an encoded form of the XML document 104. The method 1000 then 
ends at step 1016. 

The Structure Segment for Structure Stream) 

Fig. 2 shows the various sections of the structure segment (or stream) 106. The 
5 structure segment begins with a header 202 and its body is divided into a number of 
sections 204. The header 202 identifies the version of the XML and that of the encoding 
format. 

Each section 204 in the body begins with a unique signature indicating the section 

type. Hence, it is not necessary for the various sections to be arranged in a particulai- 
10 order. Nevertheless, in the following discussion, we assume the sections to be arranged in 

the order shown in Fig. 2. The section signature is followed by a size field which indicates 

the size of the section. 

An ID table section 206 allows elements with ID'S to be located quickly in a 

document hierarchy section 208. The ID table 206 may be absent from an encoded 
15 document even if the document has elements with ID's. This is because the DTD's or 

schema which contain the ID definition may not be available at the time of encoding. 

A section 210 is preferably reserved for the document type declaration and the 

internal (DTD) subset. For XML Schema-based documents, for example MPEG-7 

descriptions, this section 210 will be absent. 
20 There are sections for the code tables for namespaces 212, element names 214, 

attribute names 216 and attribute values 218. Hereafter these code tables will be referred 

to as local code tables to differentiate them from any code tables that are pre-defined for 

both the encoder and decoder and are not carried in the bitstream. For instance, there may 

be pre-defined code tables for MPEG-7 or XML Schema. 
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Tlie local code tables are usually followed by a section containing a table of 
attribute name/value pairs 220 which malces use of the codes defined in the local code 
tables as well as any pre-defined code tables. 

The document hierarchy section 208 is the encoded tree structure of the XML 
5 document using codes fi:om the local and the pre-defined code tables. 

Apart fi-om using code tables and type encoders for encoding, in most cases, the 
encoder also compresses each section using a compressor, histead of compressing each 
section of the body of the structure segment 106 independently, the body of the structure 
segment can be compressed together. Tliis may actually result in better compression ratio 
10 due to lesser overhead and the larger amount of data. However, such compression requires 
one to decompress the whole structure body in order to find out whether a document 
contains a particular element. Both approaches may be tested to determine wliich works 
better in practice. Nevertheless, if a section is small, compression may not be effective 
and the encoder may choose not to compress the section. Each section has a compressed 
15 flag to signal whether compression has been applied. If compression has been applied, the 
size field at tlie begimiing of each section indicates the compressed (rather than the 
uncompressed) size of the section in bytes. 

Potentially, a different compressor can be used for each section taking into account 
the characteristics of the data ui each section. Information about the compressors used is 
20 provided in the header. The default is to use ZLIB for compressing all the sections in the 
structure segment as well as tlie text segment. The ZUB algorithm generates a header and 
a checksum tliat allow the integrity of the compressed data to be verified at the decoder 
end. 

The Text Segment (or Text Stream! .la 
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The text segment 108 begins with a text segment signature followed by a size field 
that indicates the size of the encoded text. The text segment contains a sequence of UTF-8 
strings which are the text of the elements. 
The Encoder and Decoder Models 
5 The Encoder Model 

Fig. 3 shows an XML encoder model 300 incorporating an encoder 302 for 
encoding the XML document 104 into a bitstream 306 for storage or transmission. The 
encoder model 300 may be implemented as a software program or sub-programs operating 
within the computer module 801, the program being typically stored m the HDD 810 and 

10 read and controlled in its execution by the processor 805. The bitstream 306 may be 
transmitted upon creation via the I/O interface 808 and network 820 for complementary 
decoduig and reproduction by the server computer 850. Altematively, the bitstream 306 
may be stored in the HDD 810 or as a CD-ROM in the drive 812 for subsequent 
reproduction. The encoder 302 may support an AppUcation Program Interface (API) 308 

15 (eg. tlie DOM API) so that the document tree 102 can be encoded as the tree 102 is being 
created. A standard library 310 (for XML) is used to provide code tables 312, 
encoders 314 for built-in datatypes, and default compressors 316 that may be used in the 
encoding processes. Domain-specific libraries 318 may also be defined for various 
domains. Each domain-specific library 318 may contain code tables 320 for the particular 

20 domain and encoders 322 for some data types. An application can also provide specific 
modules 324 including appUcation-specific encoders 326 for special data types as 
discussed above and corresponding compressors 328. However, these type encoders 326 
and compressors 328 have to be either downloadable and platfomi-mdependent or pre- 
installed at the decoder end. An application can also instruct the encoder 326 to use its 
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pre-defined code tables 330. The code tables 330 can be incorporated into the 
bitstreani 306 or pre-installed at the decoder end. Each of the individual encoders and 
compressors shown in Fig, 3 may be implemented by software (sub)programs or, in some 
instances special purpose hardware (eg, for fast encoding). 
5 The Decoder Model 

Fig. 4 shows a complementary XML decoder model 400 mcluding a decoder 402 
for decoding the XML bitstream 306 to output an XML document 104. Alternatively, the 
decoder may support an API 408 (eg. the SAX ("simple API for XML") or DOM API) that 
allows an appUcation to construct its own internal model of the document tree 102. This 

10 saves tlie decoder 402 from outputting the XML document 104 and the application from 
re-parsing the reconstructed XML document 104. In either case, the decoder 402 uses the 
standard library 410, any domain-specific libraries 418 as well as any pre-installed or 
downloaded application-specific modules 424 (that were used by the encoder) when 
decoding tlie XML bitstream 306. In Fig. 4, elements of the decoder model 400 are 

15 numbered in a similar fashion to that of Fig. 3, such that where a difference of 100 exists 
in the numbering, the elements have corresponding like functions. The decoder model 400 
may for example be implemented within the computer module 801 to decode the 
bitstream 306 received via the network 820 from the server computer 850. Alternatively, 
the decoder model 400 may operate to decode a bitstream obtained from the CD-ROM, for 

20 example. Like the encoder 302, software and hardware decoding processes may be used 
within the decoder 402. 

In most cases, the decoder 402 at the client end need not validate the decoded 
XML document 104 of Fig. 4 against their DTD*s or schemas. Validation at the client side 
is costly, inefficient and most hkely redundant. The decoder 104 may assume that the 
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XML documents have been validated against their DTD's or schemas at the server end. 
Similarly, the underlying transport as well as any error detection mechanism such as 
checksums that is built into tlie binary format should be capable of catching any 
transmission eiTor. 
5 Locating Elements 

XML elements can be referenced and located using BD's or XPath/XPointer 
fragments. As mentioned earUer, the ID table 206 of the structure segment 106 allows 
elements with ID's to be located quickly in the document hierarchy section 208. Any text 
and attributes associated witli an element can then be located efficiently using the locators 
10 in the encoded elements. 

Below are some examples of XPath fragments that can be appended to an Unifomi 
Resource Lidicator (URI): 

• /doc/chapter[2]/section[3] 

selects the third section of the second chapter of doc 

15 • chapter[contains(string(title),"Overview")] 

selects the chapter children of the context node that have one or more title 
children containing the text "Overview" 

• child::*[self::appendix or self::index] 

selects the appendix and index children of tlie context node 
20 • child::*[self::chapter or self::appendix] [position()=last{)] 

selects the last chapter or appendix child of the context node 

• para[@type="warning"] 

selects all para children of the context node that have a type attribute with value 
"warning" 
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• para[@id] 

selects all the para children of the context node that have an id attribute. 
An XPath/XPointer fragment consists of a list of location steps representing the 
absolute or relative location of the required element(s) within an XML document. 
5 Typically, the fragment contains a list of element names. Predicates and functions may be 
used, as in the examples above, to specify additional selection criteria such as the index of 
an element within an array, the presence of an attribute, matching attribute value and 
matching textual content. 

The compactness of the encoded document hierarchy allows it to be parsed (and 
10 instantiated) witliout expanding into a fiiU object tree representation. The fragment 
address is first translated into an encoded form. One of the consequences of such a 
translation process is that it allows one to determine immediately whether the required 
element(s) actually occurred in the document. Matching the components of the encoded 
fragment address is also much more efficient than matching sub-strings. The design 
15 allows simple XPath/XPointer Segments (which are most frequently used) to be evaluated 
quickly. Searcliing the document hierarchy first also greatly nan*ows the scope of 
subsequent evaluation steps in the case of a more complex fragment address. 
Packetizing the Bitstream for Streaming 
Streaming XML 

20 Traditionally, XML documents are mostly stored and transmitted in their raw 

textual format, fri some appUcations, XML documents are compressed using some 
traditional text compression algorithms for storage or transmission, and decompressed 
back into XML before they are parsed and processed. Althougli compression may greatly 
reduce the size of an XML document, under such circumstances an application still must 

25 receive tlie entire XML document before parsing and processing can be performed. 
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Streaming an XML document implies that parsing and processing can start as soon 
as sufficient portion of the XML document is received. Such capabiUty will be most 
useful in the case of a low bandwidth communication link and/or a device with very 
limited resources. 

5 Because an ordinary XML parser expects an XML document to be well-formed 

(ie. having matching and non-overlapping start-tag and end-tag pairs), the parser can only 
parse the XML document tree in a depth-first manner and cannot skip parts of the 
document unless the content of the XML document is reorganized to support it. 
Packetizing the Bitstream 

10 Encoding an XML document into a complete structure segment 106 and a 

complete text segment 108 as described earlier will greatly reduce the size of the data and, 
at the same time, allow some transmission error to be detected. Nevertheless, the 
decoder 402 still has to receive a large amount of the encoded data before it can process it. 
For uistance, the decoder 402 will have received the code tables 110 in their entirety 

15 before parsing of the document hierarchy can commence. At the same time, the decoder 
402 has to wait for the arrival of certain segment of the text segment 108 to get the text 
that is associated with a node. To allow processing to be started as soon as possible at the 
decoder end, the XML document 104, as seen in Fig. 5, has to be encoded incrementally 
allowmg small packets 502 of encoded data 500 to be sent to the decoder 402 as they 

20 become available. In Fig. 5, the cross-hatched packets 504 denote structure packets and 
the diagonal-lined packets 506 denote text packets. These packets are preceded by a 
header packet 508 and followed by a trailer packet 510. hi the preferred arrangement, each 
data packet 502 has the same structure as a complete structure segment 106 or a complete 
text segment 108. At the same time, each packet 502 may be dependent on those 
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packets 502 sent before it or, in some implementations, on a predetermined number of 
packets sent after it. Such a predetermine number may be determined dynamically. 

Apart from the need for processing a document while it was being delivered, an 
encoder/decoder typically has an output/input buffer of fixed size. Accordingly, except for 
5 very short documents, the encoder 302 has to encode an XML docmnent incrementally 
into multiple packets. Each of the packets 502 (including 504, 506, 508 and 510) is 
headed by a packet header. The packet header contains a packet number that is used as a 
packet ID as well as for ordering the packets and detecting any missing packets. The 
packet header also contams a size field which indicates the size of the packet 502 in bytes 

10 and a type field which indicate whether the packet is a structure packet 504, a text 
packet 506, a header packet 508, a trailer packet 510 or a fiirther type of packet 502, 
named a command packet, not illustrated in Fig. 5, but described later in this document. 

For each structure packet 504, the ID table incorporated therein contains only the 
ZD's of those elements included m the packet. Its code tables contains only new codes that 

15 have not been transmitted. Codes that have been transmitted will not be re-assigned or re- 
mapped. The default implementation simply appends new value to the table and uses the 
index (augmented by the base index of the table) of the entiies as their codes. A slightly 
more complicated (but more code efficient) method is to count the number of occun-ences 
of the values and remapped the codes so tliat values that occur more fi-equently are 

20 remapped to shorter codes just before the packets are output. If a pre-defined code table is 
used or if the remapping is not based on the number of occiurences, sorting the values 
before compressing may result in better compression rate. A different algorithm for 
assigning code can be implemented. Nevertheless, once output, the codes are fixed and 
cannot be re-assigned to other values or re-mapped in subsequent packets. Pre-defined 
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code tables can also be specified using the UseCodeTable() method of the Encoder 
Interface described later in this specification. The method also allows one to specify 
whether the pre-defined code table is to be encoded with the data into the bitstream. The 
code tables of a number of namespaces which are fiindamental to XML (or an application 
5 domain such as MPEG-7) are expected to be hardwired to all XML (MPEG-7) encoders 
and decoders and need not be encoded mto the bitstream. 

If an ID, an element name, an attribute name, or an attribute value is longer than a 
pre-defined length, it will be encoded in a text packet and a string locator rather than the 
actual string will appear m the tables. 

10 The document hierarchy section of a structure packet contains a sequence of nodes. 

Each node has a size field that indicates its (encoded) size in bytes including the total size 
of its descendant nodes encoded in the packet. The node can be an element node, a 
comment node, a text node or a node locator. Each node has a nodeType field that 
indicates its type. 

1 5 The document hierarchy may contain: 

(i) a complete document tree: tiiis is only possible for very short document; 

(ii) a complete sub-tree: the sub-tree is the child of another node encoded in an 
earlier packet; and 

(iii) an incomplete sub-tree: the sub-tree is incomplete because the whole sub- 
20 tree caimot be encoded into one packet due to time and/or size constraints. 

Node locators are used in the mamier shown in Fig. 6A, for a tree structure 622 
wluch has incomplete sub-trees 602 and 604, for locating the missing nodes and the 
descendants of the incomplete sub-trees. In this regard, and with reference to the earher 
example, whilst tlie hierarchical tree-representation 102 of the document 104 is known 
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when encoding takes place, upon decoding of the communicated packets, only portions of 
the tree representation 102 will typically be made available. As more packets are received 
the tree may be reconstructed. For example, in the data stream shown in Fig. 6B, a 
packet 620 ( being the #2 packet in the data stream in this example) includes part of the 
5 tree structure 622 of a document, that structure including nodes A, Bl, B2 and B3. 
However, in this example, the size of the packet 620 is insufficient to describe the entire 
tree structure 622 and to accommodate other nodes, such as B4 and Dl. Node 
locators 608 and 606 respectively are thus incorporated into the descriptions of the 
corresponding parent nodes (B3 and B2 respectively) and contain the respective packet 

10 numbers 610 and 612 of a structure packets that contains a sequence of missing nodes and 
their sub-trees. As such, on receiving the sequence of packets illustrated in Fig. 6B, part 
of the tree 622 can be reconstiiicted upon receiving the packet (#2) 620 and the branch 
including node Dl can be reconstructed upon receiving packet (#7) 610 and the balance of 
the tree reconstructed upon receiving packet (#20) 612. 

15 Each element node preferably contains a namespace code, an element (name) code, 

and, if the element has attributes, the byte offset of the first attribute in the attribute 
name/value pair table and the number of attributes. 

Each text node or conament node typically contains a text locator rather than the 
actual text. Tlie text locator specifies the packet number of a text packet and a byte offset 

20 into the text packet. 

In some cases, a string may exceed the maximum size of a packet. Where such 
occurs, the string is stored as fi*agnients over multiple text packets, as shown in Fig. 7. 
Each text packet 702 has a flag 704 indicating whether it contains a list of UTF-8 encoded 
strings and string locators or a string fragment. In the case of a string fi-agment, the packet 
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number of the next fragment is also included. If a text packet contains the last (or the 
only) fragments of a string, the packet number for the next fragment is set to zero, as 
shown. 

Commands for Constructing Document Tree 
5 An XML document may be packetized for streaming to the receiver as it is being 

encoded or even generated (according to some pre-defined DTD or schema). In this case, 
the XML document is typically constructed in real-time using an API such as a DOM API. 
Instead of parsing an XML file, the encoder 302 operates to construct the bit stream 306 
from the memory representation directly. Nodes and sub-tiees inserted and appended 

10 using the API are encoded as (binary) command packets to modiJfy the memory 
representation at the decoder end. The packet number ensures that the command packets 
are executed in the correct sequence. 

Since the nodes transmitted are parts of the same document (that confonns to some 
pre-defined DTD or schema) and the document is on-line and in-sync between the 

15 encoder 302 and decoder 402 all the time, there should not be any consistency issue in 
relation to the content of the nodes. In some presentations, certain information has only 
temporal relevance. That is, some infomation is only relevant witliin a certain period of 
time during the presentation. Information units (for example, the score of a football 
match) that are relevant to two different time instances of the presentation may themselves 

20 be inconsistent. A presentation description scheme is desirable to establish the timing and 
syncln-onization model of a presentation. The timing of any media object including XML 
data can be indicated by a start time and a duration. Such a presentation encoder/decoder 
pair would typically include an XML encoder/decoder as described above arranged 
internally. The presentation decoder, rather than the XML decoder, operates to interpret 
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the Start time aiid duration attributes. The presentation encoder also decides whether or 
not to remove from memory an XML sub-tree that is no longer relevant. As long as the 
XML encoder/decoder is concemed, there is no consistency issue. If the generator is 
always required to generate valid document (fragments), then tliere is no need for a 
5 command to remove (possibly inconsistent or invalid) nodes or sub-trees. That is, only 
insert and append commands are needed. 

A command packet contains the path of (the root of) the sub-tree to be appended or 
inserted and the packet number of the structure packet that contams the sub-tree. For 
example, returning to Fig. 6B, if the locator 608 for node B4 was not able to be 

10 accommodated in the packet 620, then a conmiand packet would have to be inserted 
between packets #2 and #20 that effectively attaches node B4 to node A. That command 
packet would then include a locator pomting to the packet 612 includmg the structuie 
defined by node B4. 
The Definition of the Bitstream 

15 The bitstream 306 is preferably defined in Extended Backus-Naur Form (ENBF) in 

the fasliion defined by the Appendix. Characters are enclosed by single quote and strings 
by double quotes. Unless stated otherwise, UCS characters in UTF-8 encoding and UTF 
strings (tliat include length information) ai-e assumed. 
API 

20 API for Documents and Schemas 

It is not always necessary for the decoder 402 to convert an encoded document 
back into XML. As indicated above, the decoder 402 may support an API such as the 
SAX API, the DOM API, or other proprietary API, to allow an application to access the 
decoded content directly. This saves the decoder 402 fiom having to reconstruct and 
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output the XML document and the. application from having to re-parse tlie reconstructed 
XML document. 

An application may also have to access information stored in schemas. As 
schemas are also XML documents, they can be encoded in the same way. Using existing 
5 SAX or DOM API for accessing and interpreting schema definitions is extremely tedious. 
A parser that supports a schema API, such as the Schema API defined in Wan E., 
Anderson M., Lennon A., Description Object Model (DesOM). Doc. ISO/IEC 
JTC1/SC29AVG11 MPEG00/M5817, Noordwijkerhout, March 2000, will make accessing 
the definitions of schemas much easier. 
10 To allow the values of built-in datatypes and special types to be encoded 

efficiently, an encoder has to be able to obtain type information from the schemas. Hence, 
a schema API is also extremely important to the encoder 302. 
API for Encoders 

The binary format proposed below allows for the implementation of encoders of 
15 various capabilities and complexity. The interfaces described in this section allow one to 
construct a basic encoder that can be extended to provide the more complicated features 
supported by the encoding scheme. 
Encoder Interface 

void SetMaxPacketSize(in misigned long maxPacketSize) 
20 • Set the maximum packet size in bytes. 

void SetMaxPrivateDataSi2e(in unsigned long maxPrivateDataSize) 
• Set the maximum size of the private data in byte. Note that the amount of private data 
that can be included in a packet is limited by the maximum size of the packet. A large 
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amount of private data is not expected as such works against die objective of reducing 
the size of the bitstreani. 
void SetHeaderUserData(in ByteArray headerData) 

• Write the user data to the header packet. Any existing data will be overwritten. 
5 void UseCodeTable(in CodeTable codeTable, in Boolean encodelt) 

• Inform the encoder of a pre-defined code table and whether the code table should be 
encoded with the data. 

void SetCompressor(in Section section, in Inflater compressor) 

• Instruct the encoder to use the specified compressor for the specified section. Section 
10 is an enumeration with the following values: STRUCT_BODY=1, TEXT_B0DY=2, 

ID_TABLE=3, NS_SECT=4, ELEMENT_SECT=5. ATTR_NAME_SECT=6. 
ATTR_VALUE_SECT=7, ATTR„PAIR_SECT=8, D0C_HIEiRARCHY_SECT=9. 
Inflater has the same interface as Inflater of the java.util.zip package, 
void Flush{) 

15 • Flush the packets in the buffer to the output stream, 
void OnOutputO 

• Receive notification before the set of packets in the buffer is output to allow the 
application to insert application specific-data to the packets. 

void SetPacketUserData(in ByteArray userData) 
20 • Write the user data to each of the packets except any header packet in the buffer. Any 
existing user data will be overwritten. 
Code Table Interface 
unsigned short GetSizeQ 

• Get the mmiber of entries in the code table. 
25 wstring GetNamespace(in unsigned short i) 
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• Get the namespace of the value associated with the ith entry of the code table, 
wstring GetValue(in unsigned short i) 

• Get the value associated witli the ith entry of the code table, 
wstring GetType(in unsigned short i) 

5 • Get the type of the value associated witli the ith entry of the code table. 
ByteArray GetCode(in unsigned short i) 

• Get the code associated with the ith entry of the code table, 
unsigned short GetlndexByCode(in ByteAiray code) 

• Get the value associated with a code. 

10 unsigned short GetlndexByValue(in wstring value) 

• Get the value associated with a code, 
unsigned short GetMaxCodeValue() 

• Get the maximum code value reserved by the code table. The encoder is free to use 
code value above the maximum code value. Depending on application, an encoder 

15 may also be implemented to use holes left by a pre-defined code table. 
Tvpe Encoder Interface 
ByteArray Encode(in wstring text) 

• Encode the value into a byte array given its text representation, 
wstring Decode(in ByteArray encodedText) 

20 • Decode an encoded value into the text representation of the value. 

Encoding the XML Data, in particular MPEG-7 Descriptions of a Presentation 

If (fragments of) XML data including MPEG-7 descriptions (wliich are XML data 
used for describing audio-visual (AV) content) are to be streamed and presented with AV 
content, the timing of and the sychronization between the media objects (including the 
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XML data) have to be specitied. Like XML, the DDL ( the description definition language 
of XML) does not define a timing and synchionization model for presenting media 
objects. As mentioned above, a SMIL-like MPEG-7 description scheme called herein 
Presentation Description Scheme is desired to provide the timing and synclironization 
5 model for authoring multimedia presentations. 

It has been suggested that MPEG-7 descriptions can be treated in the same way as 
AV objects. Tliis means that each MPEG-7 description firagment, like AV objects, used in 
a presentation will be tagged with a start time and a duration defining its temporal scope. 
This allows both MPEG-7 fragments and AV objects to be mapped to a class of media 

10 object elements of the Presentation Description Scheme and subjected to the same timing 
and sychronization model. Specifically, in the case of a SMIL-based Presentation 
Description Scheme, a new media object element such as an <mpeg7> tag can be defined. 
Alternately, MPEG-7 descriptions can also be treated as a specific type of text. 

It is possible to send different types of MPEG-7 descriptions in a single stream or 

15 in separate streams. It is also possible to send an MPEG-7 description fragment that has 
sub-fragments of different temporal scopes in a single data stream or in separate streams. 
This is a role for the presentation encoder, in contrast to the XML encoder 300 discussed 
earlier. 

The presentation encoder wraps an XML packet with a start time and a duration 
20 signalling when and for how long the content of the packet is required or relevant. The 
packet may contain: 

(i) multiple short description fragments (each with their own temporal scope) 
concatenated together to achieve high compression rate and minimize overhead; 

(ii) a single description fragment; and 
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(iii) part of a large description fragment. 

In the case where the packet contains multiple description fragments, the start time 
of the packet is the earliest of the start times of the fragments while tlie duration of the 
packet is the difference between the latest of tlie end tune of tlie fragments (calculated by 
5 adding the duration of the fragment to its start time) and the start time of the packet. 

In broadcasting applications, to enable users to tune into the presentation at any 
time, relevant materials have to be repeated at regular interval. While only some of the 
XML packets have to be resent as some of the XML packets sent earUer may no longer be 
relevant, the header packet needs to be repeated. This means that, in the case of 
10 broadcasting applications, the header packet may be interspersed among structure, text and 
command packets to reset the transmission to a known state. 

Industrial Applicability 
It is apparent from the above that the arrangements described are applicable to the 
computer and data processing industries and to the efficient use of commmiication 
15 resources associated therewith whilst affording the abiUty to work with partially received 
information. 

The foregoing describes only one or more embodiments of the present invention, 
and modifications and/or changes can be made thereto without departing fi-om the scope 
and spirit of the invention, the embodiment(s) being illustrative and not restrictive. For 
20 example, wliilst described with reference to XML documents, the procedures disclose 
herein are applicable to any hierarchical representation, such as a tree representation of a 
document. 
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Appendix: 

Definition of the Bitstream 

5 The bitstream will be defined in Extended Backus-Naur Fomi (ENBF). Character will be 
enclosed by single quote and string by double quote. Unless stated otherwise, UCS 
characters in UTF-S encoding and UTF strings (that include length information) aie 
assumed. 

10 xmlbitStream ::= xmlj3acket+ 

N.B.: The bitstream of an encoded XML document consists of a sequence 
of packets. The sequence begins with a header packet and ends with a 
trailer packet. 

Packet 

15 xml jacket ::= packet_header packet_body 

packet_header ::= packet_signature packet_number packet_size packet^type 

packet_private_data 
packet_number ::= variablejength__natural_number 

N.B.: packet_number has to be greater than 0. 
20 packet__type ::= header_packet | structure_packet | text_packet | trailer_packet | 

commandjDacket 
packetjignature ::= 'x' 'm' T 'b' T 'n' 'p'- *k' 
packet^size ::= unsigned_short 

N.B.: With unsigned_short, an unsigned integer in the range 0 - 65535 is 
25 represented using 2 bytes with the first byte being the hihg-order byte 

of the integer. 
packet_private_data ::= byte_array 

packet_body ::= header_body | trailer_body 1 structure_body | text_body | 
command_body 
30 headerjDBcket ::= 'h' 
structure_packet ::= 's' 
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10 



textjDacket 
trailerjDacket 
command_packet 
byte_array 
sizejn^byte 



:= Y 



Byte 



:= sizejn^byte byte* 
:= variableJength_naturaLnumber 
N.B.: With variable_length_natural_niiinber, a natiiral number in the range 
0 - 1,073,741,823 is represented using 1 to 4 bytes with the first byte 
being the high-order byte of the number. The two most significant 
bits of the high-order byte is actually used to indicate the number of 
additional bytes used for representing the number. For instance, '01' 
impUes one additional byte or a 2-byte representation and '11' unpHes 
3 additional bytes or a 4-byte representation.) 
::=[#xOO #xFF] 



15 Header 

header_body ::= enoding_version xmLversion xml_params max_packet_size 
max_decompressed_packet_si2e maxj3acket_number 
section_compressorjist type_encoderJist 

xmi_params ::= count xmLencoding xml_standalone 

20 encoding_version ::="1.0" 

xml_version ::= "1.0" 

count ::= variable Jength_naturaLnumber 

xmLencoding ::= UTF8_string 

N.B.: With UTF8_String, the first two bytes is an unsigned short, the UTF 
25 length, that specifies the number of additional bytes to be read. The 

additional bytes contain the UTF-8 encoding of the string. 
xml__standalone ::= y | 'n' 

max_packet__size ::= variable Jength_natural_number 

N.B. : A value of zero impHes that the maximum packet size is milaiown. 
30 max_packet_number :;= variablejength_natural_number 

N.B.: A value of zero implies that the maximum nmnber of packets is 
unlcnovm. 

section_compressorJist ::= count ( section J D compressor_URI )* 
type^encoderjlst ::= count (typeJD type_encoder_URl )* 
35 compressor_URI ::= URI 
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type_encoder_URI 
URI 

section ID 



= URI 

= UTF8_string 

= strucLbodyJD | text_bodyJD ] idJableJD | ns_section_ID | 
element_sect_ID | attribute_name_sectJD | attribute_value_sect_ID 
I attribute j)air_sectJD | doc_hierarchy_sectJD 



struct_bodyJD 


:= V 




text_bodyJD 


:= T 




idJableJD 






ns^sectionJD 


:= 'n' 




element_sectJD 


:= 'e' 




attribute_name_sect 


JD 


'a' 


attribute_value_sect_ 


JD ::= 


V 


attribute j)air_sect J D : := 


'P' 


doc_hierarchy_sectJD : := 


•d" 


type^lD 


:= [ #xOO 


#xFF] 


other_typeJD : 


:= #xOO 




string_ ID : 


:= #x01 




stringJocator_ID 


:= #x02 




boolean J D 


:= #x03 




byteJD 


:= #x04 




unsigned^shortJD : 


:= #x05 




short J D : 


:= #x06 




unsignedJongJD : 


:= #x07 




longJD 


:=#x08 




floatJD 


:= #x09 




doubleJD : 


:= #xOA 




dateJD 


:= #xOB 




time ID 


:= #xOC 





30 

N.B.: The above list for built-in datatypes are not complete. Type 00-OF 
are for built-in datatypes. An XML encoder can assign type 10-FF to 
application-specific types. The appUcation is responsible for 
providing the (Java) type encoder and decoder for any application- 
35 specific types. These type encoder and decoder must be pre-installed 

or downloaded before they are required. When type mfomiation is 
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not available, XML text aiid attribute values will be ti eated as string. 

Trailer 

trailer_body ::= 

5 N.B.: At the moment, the trailer packet is only used to signal the end of the 

XML document. The body of the trailer packet is emptly. 

Structure Packet 

structure_body ::= [ IDJable_section ] [ intemaLsubset_section ] 
10 [ ns_table_section ] [ element_name_codetable_section ] 

[ attribute_name_codetable_section ] 
[ attribute_value_codetable_section ] 

[ attribute_name_valuejDair_section ] [ document_hierarchy_section ] 
N.B.: Although the above EBNF rule defines the various sections of the 
15 body of a structure packet to be arranged in a particular order, the 

sections are actually allowed to be arranged in any order as each 
section is identified by its unique signature. 

ID Table Section 

20 IDJable_section ::= IDJable_section_signature section_size compressed 

entry_Gount ( IDJable | compressedJDJable ) 
section_size ::= sizejn_byte 

N.B.: section_size stores the size of the section excluding its signature, 
compressed ::= boolean 

25 KB.: The compressed flag indicates whether the table is compressed. 

N.B.: With boolean, a byte value of 1 represents true an a byte value of 0 
represents false. 
entry__count ::= variableJength_naturaLnumber 

size_of_compressed_IDJable ::= variableJength_naturaLnumber 
30 ID_table ::= ( ID^string offset_to_the_document_hierarchy )* 

N.B.: ID_table defines die structure of the uncompressed ID table. The ID 
table only collects ID of nodes (not mcluding nodes referred to by 
node locators) that appears in the document hierarchy of die same 
packet. If type information is not available during encoding, IDs will 
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not be collected into the JD table even if they are present in the 
document as there is no way the encoder can identify them. 

ID_string ::= UTF8_string 

offset_to_the_document_hierarchy ::= byte_offset 
5 N.B.: offset_to_the_document_hierarchy is the byte ofEset to 

document_hierarchy in the (uncompressed) 

document_hierarchy_section not the byte offset to the (uncompressed) 
document_hierarhy_section 

byte_offset ::= variablejength_natural_number 

10 ID_table_section_signature ::= #xFF01 



Internal Subset Section 

lntemaLsubset_section internaLsubset_section_signature section_si2e 

compressed [byte*] 

15 N.B.: The detail of tlie intemal subset section has yet to be defined. 

NS_table_section_signature ::= #xFF02 



Namespace Table Section 

NS_table_section ::= NS_table_section_signature section^size compressed 
20 entry_count index_base ( NS_table | compressed_NS_table ) 

index_base ::= variableJength_naturaLnumber 

N.B.: The index into the NS_table is used as the namespace code. The 
base of the index is specified in the field index_base. The namespace 
code 0 is reserved for the null namespace. Hence, a namespace table 
25 cannot have an index_base of 0. 

NSJable ::= ( NS^URI )* 

N.B.: NS_table defines the structure of the uncompressed NS table. The 
index into the table is used as the namespace code. The base of the 
index is specified in the field index_base. The namespace code 0 is 
30 reserved for . the null namespace. Hence, a namespace table cannot 

have an index_base of 0. 
NS^URl ::= URI 

NS__table_section_signature ::= #xFF03 
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Code Table Sections 

element_name_codetable_section ::= element_name_codetable_section_signature 
section^size compressed entry_count index_base ( 
element_name_cx)detable | compressed_element_name_codetable ) 

5 attribute_name_codetable_section ::= attribute_name_codetable_section_signature 

section_size compressed entry_count index_base ( 
attribute_name_codetable | compressed_attribute_name_codetable ) 
attribute__value_codetable_section ::= attribute_value_codetable_sect!on_signature 
section_size compressed entry_count index_base 

10 has_predefined_code ( attribute_value_codetable | 

compressed_attribute_value_codetable ) 
N.B.: The index into each code table is used as the code unless there is a 
predefined code. The code tables allow the mapping between the 
codes used for tlie encoding and the actual values. The base of the 

15 index for each table is specified in the field index_base of that table. 

Only positive codes are allowed. Hence, index_base cannot have a 
value of zero. 

element_name_codetable_section_signature ::= #xFF04 
attribute_name_codetable_section_signature ::= #xFF05 
20 attribute_value_codetable_section_signature ::= #xFF06 
hasj)redefined_code ::= boolean 

N.B.: The has_predefined_code flag specify whether the code table has a 
predefined_code column. 
Element name code table 
25 element_name__codetable ::= element_name_code_table_entry* 

N.B.: element_name_codetable defines the structure of the uncompressed 
element name code table. The index into the table is used as the 
element name code unless there is a predefined code. The base of the 
index is specified in the field index_base. The code 0 is reserved. 
30 Hence, a code table cannot have an index_base of 0. 

element_name_codetable_entry ::= ns_code element_name typeJD [ 
predefined_code ] 

N.B.: Except for the built-in datatypes and special types that are known to 
the encoder, textual content of all other type will be encoded as string. 
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predefined__code ::= byte_array 

N.B.: An empty predefined_code implies tiiat there is no predefined code 
for that entry. This should not happen. If an value is missing from a 
pre-defined code table. The encoder has to generate a code for the 
5 value and store it in the predefined__code field. 

element_name ::= non__empty_UTF8_string | ( #xO0OO stringjocator ) 

N.B.: The element names are usually stored in-line in the table. However, 
if an element name is too long, it can be stored in a separate text 
packet and a string locator is used in the table instead. 
10 stringjocator ::= text_packet_number byte_offset 

N.B.: A byte_offset specifies the offset into the text packet's body where 
the string can be found. 
non_empty_UTF8_string ::= UTF8_string - 

15 Attribute name code table 

attribute_name_codetable : := attribute_name_code_table__entry* 

N.B.: attribute_name_codetable defines the structure of the uncompressed 
attribute name code table. The index into the table is used as the 
attribute name code unless there is a predefined code. The base of the 
20 index is specified in the field index__base. The code 0 is reserved. 

Hence, a code table cannot have an index_base of 0. 
attribute_name_codetable_entry ::= ns_code attribute_name type_ID [ 
predefined_code ] 

N.B.: Except for the built-in datatypes and special types that are known to 
25 the encoder, textual content of all other type will be encoded as string, 

attribute_name non_empty_UTF8_string | ( #xOOOO stringjocator ) 

N.B.: The attribute names are usually stored in-line in the table. However, 
if an attribute name is too long, it can be stored in a separate text 
packet and a string locator is used in the table instead. 

30 

Attribute value code table 

attribute_value_codetable ::= attribute_value_codeJable_entry* 

N.B.: attribute_value__codetable defines tlie structure of the uncompressed 
attribute value code table. The index into the table is used as the 
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attribute value code unless tliere is a predefined code. The base of the 
index is specified m the field indexj)ase. The code 0 is reserved. 
Hence, a code table cannot have an index_base of 0. 
attribute_value_codetable_entfy ::= ns_code attribute_value type__ID [ 
5 predefined_code ] 

N.B.: Except for the built-in datatypes and special types that are known to 
the encoder, textual content of all other type will be encoded as string. 
attribute_value ::= encoded^value 

N.B.: The attribute value are usually stored in-line in the table. 
10 encoded_value ::= encoded_value_of_non_stringJype | non_empty_UTF8_string | ( 

#xOO ) I ( #xOOOO stringjocator ) 
N.B.: Values are encoded according to their types. Except for built-in 
datatypes and special types that are known to tlie encoder, value are 
encoded as string. 

15 N.B.: An empty UTFS-string has to be followed by #xOO to distmguish it 

from a valid string locator. Again, if an attribute name is too long, it 
can be stored in a separate text packet and a string locator is used in 
the table instead. 

20 Attribute Name/Value Pair Section 

attribute_name_value_pair_section ::= attribute_name_value_pair_section_signature 
section_size compressed entry_count index_base ( 
attribute_name_valuej3airjable I 
compressed_attribute_name_value_paiMable ) 
25 attribute_name_value _pair_table = attribute_name_value_pair_entry* 

N.B.: attribute_name_valuejpairjable defines the structure of tlie 
uncompressed attribute name/value pair table. The base of the index 
(> 0) is specified in the field index_base. 
attribute_name_value_pair_entry ::= attribute_name_code attribute_value_cade 
30 attribute_name_valuej3air_section_signature ::= #xFF07 

Document Hierarctiy Section 

document_hierarchy_section document_hierarchy_section_signature section_size 
compressed ( subtree | compressed_subtree ) 
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::= node 

N.B.: subtree defines the structure of the uncompressed XML sub-tree. 

::= node^size node_type ( element_node | text_node | 
comment_node | node Jocator ) 
N.B.: The node_size includes the size of the node and its descendent nodes 
encoded in the same packet. 

::= (element_node_signature element_flag) | 

( ( text_node_signature | Gomment_node_signature | 
locator_node_signature ) #xO ) 
10 element_node_signature ::= #x3 
text_node_signature ::= #x5 

comment_node_signature ::= #x9 
locatore_node_signature ::= #xC 

element_flag ::= has_attributes | has_children | has_attributes_and_children 
15 has_attributes ::= 0x1 
has_children ::= 0x2 
has_attributes_and_children ::= 0x3 

element_node ::= element_name_code [ attributes ] [ child_node* ] 

child_nodes ::= node 

20 attributes ::= index_of_starting_attribute__name_value_pair 

number_of_attributes 

number_of_attributes : := variable Jength_natural_number 

text Jocator ::= stringjocator 

comment_node ::= textjocator 
25 nodejocator ::= packet_number 

Text Packet 

text_body ::= compressed ( encoded_text | compressed_encoded_text ) 

encodedjext ::= ( 0x00 encoded_value* ) | ( next_pacl<et_number UTF8_string 
30 ) 

N.B.: If next_packet_number is zero, the first string of tlie text packet may 
be the last fragments of a long string. If nextj)acket_number is non- 
zero, the whole text packet contains a single fi-agment of a string, 
next j3acket_number : := variable Jength_naturaLnumber 

35 



subtree 



node 



node_type 
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Command Packet 

command_body ::= command path packet_number_of_subtree 

N.B.: The subtree to be added is defined in tlie structure packet with the 
specified packet number. 
5 command ::= insert_command | append_command 

insert_command ::= #x01 
append_command ::=#x02 
path ::= URLreference 

URI-reference ::= UTF8_string 



End of Appendix 
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CLAIMS: 

1 . A method of communicating at least part of a structure of a document described by 
a hierarcliical representation, said method comprising the steps of: 

5 identifying said representation of said document; 

packetizing said representation into a plurality of data packets, said packets having 
a predetemiined size, said packetizing comprising creating at least one link between a pair 
of ^ said packets, said link representing an interconnection between corresponding 
components of said representation; and 
10 fomiing said data packets into a stream for communication wherein said links 

maintain said representation witliin said packets. 

2. A method according to claim 1 further comprising the steps of: 
receiving said stream; 

15 decoding said packets from said stream to identify said links; 

using said luiks to reconstruct said representation for those portions of said 
representation not packetized with one packet of said stream. 

3. A method according to claim 1 or 2 wherein said corresponding components 
20 comprise at least one structure component and a one content component of said document. 

4. A method of communicating at least part of the structure of a document described 
by a hierarcliical representation, said method comprising the steps of: 
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identifying at least one part of said representation and packetizing said parts into at 
least one packet of predetemiined size; and 

where any one or more of said parts of said representation do not fit within one 
said packet, defining at least one link jfrom said one packet to at least one further said 
5 packet into which said non-fitting parts are packetized, said link maintaining the 
liierarchical structure of said document in said packets. 

5. A method according to any one of claims 1 to 4 wherein said hierarchical 
rqjresentation comprises a tree representation. 

10 

6. A method according to any one of claims 1 to 5 wherein said document comprises 
an XML document. 

7. A method according to any one of claims 1 to 6 wherein said predetermined size is 
15 a predetermined maximum size. 

8. A method of facilitating access to the structure of an XML document, said method 
comprising the steps of: 

identifying a hierarchical representation of said document; 
20 packetizing said representation into a plurality of packets of predetermined packet 

size; 

fonning links between said packets to define those parts of said representation not 
able to be expressed within a packet thereby enabling reconstmction of said 
representations after de-packetizing. 
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9. A method of encoding an XML document, said metliod comprising the steps of : 

examining said XML document to identify each data type fomiing part of said 
XML document; 

identifying a first set of said data tj'pes for which a corresponding special encoding 
5 format is available; 

first encoding each part of said XML document having a data type in said first set 
with the corresponding special encoding format; 

second encoding each remaining part of said XML document with a default 
encoding format corresponding to the data type of said remaining part; 
10 forming a representation of information referencing at least each said data type in 

said first set with the corresponding special encoding format; and 

associating said representation and said encoded parts as an encoded form of said 
XML document. 

15 10. A method according to claim 9 wherein said encoding separately encodes structure 
and content parts of said XML document, and said representation is retained in a header 
portion of the encoded form of said XML document. 

IL A method according to claim 10 wherein said representation is retained in said 
20 header portion as a table. 

12. A method according to claim 10 wherein said separately encoded parts comprise 
packets of at least one bitstream. 
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13. A method according to claim 9 wherein said first encoding comprises examining 
said special data type and said corresponding part and determining a one of said encoding 
formats to be applied to said part. 

5 14. A method according to claim 9 wherein said encoding comprises selecting one of a 
plurality of said encoding formats corresponding to a data type of said part and encoding 
said part with said selected encoding fonnat. 

15. A method of decoding an encoded XML document, said method comprising the 
10 steps of: 

examining said encoded XML document to identify an encoding fonnat associated 
with each data type forming part of said XML document; and 

decoding each said part using a decoder complementing the encoding format with 
which the corresponding data type was encoded. 

15 

16. A method according to claim 15 wherein said encoded XML document comprises 
separately encoded structure parts and content parts of said XML document and said 
examining comprises identifying from witliin at least one structure part a representation 
associating an encoded part of said XML document with the corresponding encoding 

20 fonnat. 

17. A method according to claim 16 wherein said encoded XML document is formed 
as a plurality of packets and said representation is formed within a header of at least one 
said packet. 
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18. A method according to claim 17 wherein said representation comprises a table. 

19. A metliod according to any one of claims 16 to 18 wherein said representation 
5 comprises a URI. 

20. A method according to claim 16 wherein said representation relates to at least a set 
of special data types having corresponding decoding formats 

10 21. A method according to claim 15 wherein if a decoder is not available to decode a 
part of said encoded XML document, said method comprises ignoring said part. 

22. A packetized bitstream formed using the method of any one of claims 1 to 14. 

15 23. Apparatus for communicating at least part of a structure of a document described 
by a hierarchical representation, said apparatus comprising: 

means for identifying said representation of said document; 
means for packetizing said representation into a plurality of data packets, said 
packets having a predetermined size, said packetizing comprising creating at least one link 
20 between a pair of said packets, said link representing an interconnection between . 
corresponding components of said representation; and 

means for forming said data packets into a stream for conmiunication wherein said 
hnks maintain said representation within said packets. 
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24. Apparatus for encoding an XML document, said apparatus comprising: 

means for examining said XML document to identify each data type forming part 
of said XML document; 

means for identifying a first set of said data types for which a corresponding 
5 special encoding format is available; 

means for first encoding each part of said XML document having a data type in 
said first set with the corresponding special encoding format; 

means for second encoding each remaining part of said XML document with a 
default encoding format corresponding to the data type of said remaining part; 
10 means for forming a representation of information referencing at least each said 

data type in said first set with the corresponding special encoding fomiat; and 

means for associating said representation and said encoded parts as an encoded 
form of said XML document. 

15 25. A computer readable medium, having a program recorded thereon, where the 
program is configured to make a computer execute a procedure for communicating at least 
part of a structure of a document described by a liierarchical representation, said progi'am 
comprising steps for: 

identifying said representation of said document; 

20 packetizing said representation into a plurality of data packets, said packets having 

a predetermined size, said packetizing comprising creating at least one link between a pair 
of said packets, said link representing an interconnection between corresponding 
components of said representation; and 
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fonning said data packets into a stream for communication wherein said links 
maintain said representation witliin said packets. 



26. A computer readable medium according to claim 25 further comprising the steps 
5 for: 

receiving said stream; 

decoding said packets from said stream to identify said links; 
using said links to reconstruct said representation for those portions of said 
representation not packetized with one packet of said stream. 

10 

27. A computer readable medium according to claim 25 or 26 wherein said 
corresponding components comprise at least one structure component and a one content 
component of said document. 

15 28. A computer readable medium, having a program recorded thereon, where the 
program is configured to malce a computer execute a procedure for communicating at least 
part of the structure of a document described by a hierarchical representation, said program 
comprising steps for: 

identifying at least one part of said representation and packetizing said parts into at 

20 least one packet of predeteniiined size; and 

where any one or more of said parts of said i*epresentation do not fit within one 
said packet, defining at least one link firom said one packet to at least one fuither said 
packet into which said non-fitting parts are packetized, said link maintaining the 
hierarchical structure of said document in said packets. 
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29. A computer readable medium according to any one of claims 25 to 28 wherein said 
hierarchical representation comprises a tree representation. 

5 30. A computer readable medium according to any one of claims 25 to 29 wherein said 
document comprises an XML document. 

31. A computer readable medium according to any one of claims 25 to 30 wherein said 
predetermined size is a predetermined maximum size. 

10 

32. A computer readable medium, having a program recorded diereon, where the 
program is configured to make a computer execute a procedure to facilitating access to the 
structure of an XML document, said program comprising steps for: 

identifying a hierarchical representation of said document; 
15 packetizing said representation into a plurality of packets of predetermined packet 

size; 

forming links between said packets to define those parts of said representation not 
able to be expressed within a packet thereby enabling reconstruction of said 
representations after de-packetizing. 

20 

33. A computer readable medium, having a program recorded thereon, where the 
prograin is configured to malce a computer execute a procedure to encode an XML 
document, said program comprising steps for : 
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examining, said XML document to identifj' each data type fonning part of. said 
XML document; 

identifying a first set of said data types for which a corresponding special encoding 
format is available; 

5 first encoding each part of said XML document having a data type in said first set 

with the corresponding special encoding format; 

second encoding each remaining part of said XML document with a default 
encoding format corresponding to the data type of said remaining part; 

forming a representation of information referencing at least each said data type in 
10 said first set with the corresponding special encoding format; and 

associating said representation and said encoded parts as an encoded form of said 
XML document. 

34. A computer readable medium according to claim 33 wherein said encoding 
15 separately encodes structure and content parts of said XML document, and said 

representation is retained in a header portion of the encoded form of said XML document. 

35. A computer readable medium according to claim 34 wherein said representation is 
retained in said header portion as a table. 

20 

36. A computer readable medium according to claim 34 wherein said separately 
encoded parts comprise packets of at least one bitstream. 
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37. A computer readable medium according to claim 33 wherein said first encoding 
comprises examining said special data type and said corresponding part and determining a 
one of said encoding formats to be applied to said part. 

5 38. A computer readable medium according to claim 33 wherein said encoding 
comprises selecting one of a plurality of said encoding formats con*esponding to a data 
type of said part and encoding said part with said selected encoding format. 

39. A computer readable medium, having a program recorded thereon, where the 
10 program is configured to make a computer execute a procedure to decode an encoded 

XML document, said program comprising steps for: 

examining said encoded XML document to identify an encoding foimat associated 
with each data type forming part of said XML document; and 

decoding each said part using a decoder complementing the encoding format with 
1 5 which the corresponding data type was encoded. 

40. A computer readable medium according to claim 39 wherein said encoded XML 
document comprises separately encoded structure parts and content parts of said XML 
document and said examining comprises identifying firom within at least one structure part 

20 a representation associating an encoded part of said XML document with the 
corresponding encoding format. 

41. A computer readable medium according to claim 40 wherein said encoded XML 
document is fonned as a plurality of packets and said representation is formed within a 
header of at least one said packet. 
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42. A computer readable medium according to claim 41 wherein said representation 
comprises a table. 

5 43. A computer readable medixmi according to any one of claims 40 to 42 wherein said 
representation comprises a URI. 

44. A computer readable medium according to claim 43 wherem said representation 
relates to at least a set of special data types having corresponding decoding formats 

10 

45. A computer readable medium according to claim 44 wherein if a decoder is not 
available to decode a part of said encoded XML document, said method comprises 
ignoring said part. 

15 46. Apparatus for communicating at least part of a structure of a document described 
by a hierarchical representation, said apparatus comprising: 

an identifying unit which identifies said representation of said document; 
a packetizing unit which packetizes said representation into a plurality of data 
packets, said packets having a predetemiined size, said packetizing comprising creating at 
20 least one link between a pair of said packets, said link representing an intercomiection 
between corresponding components of said representation; and 

a forming unit which fomis said data packets into a stream for communication 
wherein said links maintain said representation within said packets. 
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47. Apparatus for encoding an XML document, said apparatus comprising: 

an examining unit which examines said XML document to identify each data type 
forming part of said XML document; 

an identifying unit which identifies a first set of said data types for which a 
5 corresponding special encoding format is available; 

a first encoding unit which encodes each part of said XML document having a data 
type ill said first set with the corresponding special encoding format; 

a second encoding luiit which encodes each remaining part of said XML document 
with a default encoding format corresponding to tlie data type of said remaining part; 
10 a forming unit which forms a representation of infomiation referencing at least 

each said data type in said first set with the corresponding special encoding format; and 

an associating unit which associates said representation and said encoded parts as 
an encoded form of said XML document. 

15 48. A method of encoding a document described by a hierarchical representation 
substantially as described herein with reference to the drawings. 

49. A method of decoding a packetized stream incorporating a hierarchical 
representation substantially as described herein witli reference to the drawings. 
20 50. A method of communicating a document described using a hierarchical 
representation substantially as described herein with reference to the drawings. 

5 1 . Apparatus for performing the method of any one of claims 48 to 50. 
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52. A computer readable medium having a computer program recorded thereon for 
performing the method of any one of claims 48 to 50. 
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